Supermarket Sales Analysis Project - Profile Report¶
In [16]:
import pandas as pd
from ydata_profiling import ProfileReport
In [17]:
# read data
sales_data = pd.read_csv('data/supermarket_sales.csv', delimiter=',', decimal='.')
In [18]:
sales_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Invoice ID 1000 non-null object 1 Branch 1000 non-null object 2 City 1000 non-null object 3 Customer type 1000 non-null object 4 Gender 1000 non-null object 5 Product line 1000 non-null object 6 Unit price 1000 non-null float64 7 Quantity 1000 non-null int64 8 Tax 5% 1000 non-null float64 9 Total 1000 non-null float64 10 Date 1000 non-null object 11 Time 1000 non-null object 12 Payment 1000 non-null object 13 cogs 1000 non-null float64 14 gross margin percentage 1000 non-null float64 15 gross income 1000 non-null float64 16 Rating 1000 non-null float64 dtypes: float64(7), int64(1), object(9) memory usage: 132.9+ KB
In [19]:
sales_data.describe()
Out[19]:
| Unit price | Quantity | Tax 5% | Total | cogs | gross margin percentage | gross income | Rating | |
|---|---|---|---|---|---|---|---|---|
| count | 1000.000000 | 1000.000000 | 1000.000000 | 1000.000000 | 1000.00000 | 1000.000000 | 1000.000000 | 1000.00000 |
| mean | 55.672130 | 5.510000 | 15.379369 | 322.966749 | 307.58738 | 4.761905 | 15.379369 | 6.97270 |
| std | 26.494628 | 2.923431 | 11.708825 | 245.885335 | 234.17651 | 0.000000 | 11.708825 | 1.71858 |
| min | 10.080000 | 1.000000 | 0.508500 | 10.678500 | 10.17000 | 4.761905 | 0.508500 | 4.00000 |
| 25% | 32.875000 | 3.000000 | 5.924875 | 124.422375 | 118.49750 | 4.761905 | 5.924875 | 5.50000 |
| 50% | 55.230000 | 5.000000 | 12.088000 | 253.848000 | 241.76000 | 4.761905 | 12.088000 | 7.00000 |
| 75% | 77.935000 | 8.000000 | 22.445250 | 471.350250 | 448.90500 | 4.761905 | 22.445250 | 8.50000 |
| max | 99.960000 | 10.000000 | 49.650000 | 1042.650000 | 993.00000 | 4.761905 | 49.650000 | 10.00000 |
In [20]:
profile = ProfileReport(sales_data, title="Pandas Profiling Report")
profile
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
Out[20]:
In [ ]:
In [ ]: